-
Notifications
You must be signed in to change notification settings - Fork 285
[DOC] Add temp_storage_bytes usage guide #6208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ../api/device | ||
|
|
||
|
|
||
| Determining Temporary Storage Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to also mention the single-phase API
1dc3643 to
11abc08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single phase APIs open up a whole new space of explanation for the documentation - the two phase clarifications though are much needed. Thanks a lot for taking the time to provide docs for it.
docs/cub/api_docs/device_wide.rst
Outdated
| * **Can be nullptr/uninitialized**: All input/output pointers (``d_in``, ``d_out``, etc.) | ||
| * **Note**: The algorithm does not access input data during the query phase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually providing this guarantee? Can you point at a place from which you derive this fact?
Hypothetical scenario: we could determine the temporary storage based on the alignment of another input pointer. AFAIK we don't do that, but currently, we could.
However, since we seem to be vage about what's required on all parameters that are not taking part in the query phase, maybe we should just define what's being suggested here. But that requires some broader approval and probably a review of all existing APIs.
@gevtushenko what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernhardmgruber Thanks for flagging this great point. I re-audited the device-wide dispatch layer to make sure we're not overpromising. Every dispatcher we ship
(dispatch_reduce*.cuh, dispatch_scan*.cuh, dispatch_select_if.cuh,
dispatch_histogram.cuh, dispatch_radix_sort.cuh, dispatch_merge*.cuh,
dispatch_rle.cuh, dispatch_unique_by_key.cuh, dispatch_three_way_partition.cuh,
dispatch_topk.cuh, dispatch_adjacent_difference.cuh, dispatch_batch_memcpy.cuh)
exits immediately when d_temp_storage == nullptr- no kernels launch and no user pointers are dereferenced. I've updated the “What arguments are needed during the query phase?” bullets to call that out explicitly and list the audited dispatchers. Please let me know if you’d like me to add anything else or tighten the wording further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should list the specific implementations, but rather provide a general guarantee.
@gevtushenko can we agree that any arguments, except for the temporary storage pointer and size reference, are not inspected during a size query call of a CUB device API?
Add documentation explaining temp_storage_bytes query pattern.
Fixes #847